Analysis of Selective Strategies to Build a Dependency-Analyzed Corpus
نویسنده
چکیده
This paper discusses sampling strategies for building a dependency-analyzed corpus and analyzes them with different kinds of corpora. We used the Kyoto Text Corpus, a dependency-analyzed corpus of newspaper articles, and prepared the IPAL corpus, a dependency-analyzed corpus of example sentences in dictionaries, as a new and different kind of corpus. The experimental results revealed that the length of the test set controlled the accuracy and that the longest-first strategy was good for an expanding corpus, but this was not the case when constructing a corpus from scratch.
منابع مشابه
Older Adults as Discursively Constructed in Taiwanese Newspapers: A Critical Discourse Analysis
This paper uses critical discourse analysis to examine discursive representations of older people in Taiwanese newspapers. A total of 926 references to older people were sampled from 62 articles published in four Taiwanese newspapers from January to August 2013. The findings suggest that, older people were frequently allocated roles suggestive of dependency. Those portrayed in line with the pos...
متن کاملAnnotation of Discourse Connectives for the Prague Dependency Treebank
The paper presents a preliminary study on discourse connectives (DC) in Czech. Aiming to build a computerized language corpus capturing discourse relations in Czech, we base our observations on current foreign projects with the same purpose. In this study, first, the different methods of linguistic analysis of the discourse structure and discourse connectives are described, next, the nature and...
متن کاملStrategies Used in the Translation of Interlingual Subtitling
This study was an attempt to identify the interlingual strategies employed to translate English subtitles into Persian and to determine their frequency, as well. Contrary to many countries, subtitling is a new field in Iran. The study, a corpus-based, comparative, descriptive, non-judgmental analysis of an English-Persian parallel corpus, comprised English audio scripts of five movies of differ...
متن کاملGearing the Discursive Practice to the Evolution of Discipline: Diachronic Corpus Analysis of Stance Markers in Research Articles’ Methodology Section
Despite widespread interest and research among applied linguists to explore metadiscourse use, very little is known of how metadiscourse resources have evolved over time in response to the historically developing practices of academic communities. Motivated by such an ambition, the current research drew on a corpus of 874315 words taken from three leading journals of applied linguistics in orde...
متن کاملDependency Analysis of Japanese Spoken Language via SVM
This paper discuss a dependency analyzer employing Support Vector Machines (SVMs) for Japanese spoken language. Most conventional dependency analyzers target written texts. Thus, we use a currently available spoken language corpus and make the SVMs learn the corpus to build a dependency analyzer that targets spoken language. We used two types of corpora: one contains written language, and the o...
متن کامل